home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group93c.txt
/
000026_icon-group-sender _Thu Jul 22 12:40:36 1993.msg
< prev
next >
Wrap
Internet Message Format
|
1994-02-02
|
4KB
Received: by cheltenham.cs.arizona.edu; Thu, 22 Jul 1993 12:27:11 MST
Message-Id: <9307221702.AA20596@uu3.psi.com>
Date: Thu, 22 Jul 93 12:40:36 EDT
From: Jerry Leichter <leichter@lrw.com>
To: ICON-GROUP@cs.arizona.edu
Subject: RE: More on Icon performance
X-Vms-Mail-To: UUCP%"walter!flaubert!norman@uunet.uu.net"
Status: R
Errors-To: icon-group-errors@cs.arizona.edu
I want to create dags, so that I need to check to see if the node I'm
about to create already exists. The best way I know of is to create a
string that is a function of the node's elements, then look for that
string in a table. Does anyone else know a better way?
Here's what I'm doing:
procedure newitem(nt, dotpos, cat)
static cache
initial { cache := table() }
s := nt || " " || image(cat) || " " || dotpos
/cache[s] := item(nt, dotpos, cat)
return cache[s]
end
Can anyone suggest something better? I'm especially interested in
something less wasteful of string space.
What you are using is an inefficient variation of a classic, efficient tech-
nique used in compilers that goes under the name of value numbering. It
arises in common subexpression removal. Suppose you had a tree representing
an expression, and wished to find all common subexpressions within it. A
common subexpression would be a subtree whose leaves were all leaves of the
original tree.
Assign to each node in the tree a value as follows: If the node is a leaf,
assign it a unique value based on what the leaf contains. (In the compiler
context, a leaf is either a variable or a constant. The value represents
the particular value or constant.) If the node is of the form [v1,...,vn],
where the vi's are the value numbers of then node's n children, assign a
value f(v1,...,vn), which is unique among all the value numbers chosen for
the leaves, and among all values for f with different sets of arguments.
Then two nodes represent common subexpressions if and only if they have the
same value number.
What you are doing is using "value numbers" that are not numbers, but strings;
and you are using image() for f(), and at the leaves. While this works, as
you note it's expensive. If you use numerical values, it's much easier to
build small data strutures in which you can look things up quickly. (f() is
almost always computed by (a) looking up [v1,...,vn] in an appropriate table -
usually a hash table, but some kind of trie might make sense; (b) if the node
isn't already in the table, incrementing a counter and assigning its value as
the value number. Given the nature of the leaves in a compiler, it's usually
easy to come up with simple value numbers for them directly.)
You'll need some way to distinguish between leaves, which you could still
look up using image(), and interior nodes; and you'd need some representation
of [v1,...,vn]. It's unclear from your code, but if indeed you are only
concerned with binary nodes, and your data structures aren't too large, then
you might be able to get away with 16-bitf value numbers, and represent
[v1,v2] as 2^16*v1+v2, which can be computed and looked up in a table very
quickly. More generally, a list would probably make more sense.
If certain nodes in the tree have appropriate mathematical properties, it's
possible to adjust the value numbering algorithm to make use of them. For
example, if we consider "+" to be commutative, so that "a+b" and "b+a" are
to be considered the same subexpression, then we can choose f() so that
f(+,v1,v2) = f(+,v2,v1). (Generally you'd do this by exchanging f's second
and third arguments before computing it if, say, the second is larger than
the third.) Without knowing anything about your application, it's impossible
to say whether this trick might be useful.
-- Jerry